Skip to content

Conversation

@scotts
Copy link
Contributor

@scotts scotts commented Oct 27, 2025

Not ready for merging or final review. This is a demonstration of accepting both TorchVision transforms and defining complementary specifications in TorchCodec. The purpose of this PR, right now, is to get agreement on if this is the right approach. If we can agree, I'll polish everything and write more tests.

For context, in #526, I had initially proposed not using TorchVision transforms, and instead coming up with TorchCodec specific versions. @NicolasHug proposed that we accept TorchVision transforms, and that's what I followed up with in my design in #885.

After discussing the previous iteration of this PR, we agreed we wanted to see what it would look like to accept both. Having implemented this, I agree it's the right thing to do:

  1. We now don't need to require TorchVision, even when using the decoder-native feature.
  2. We have a natural place to document the behavior of each decoder-native transform that we accept, and what its limitations are compared to the TorchVision version of that transform.
  3. We have a more principled mechanism of enforcing how TorchVision transforms map to decoder-native semantics. We still have to dig into the TorchVision object to get the info we need, but the torchcodec.transforms class is a clear representation in code of what is supported. In the old PR, that mapping was buried in the logic that turned the TorchVision transform directly into the specification string the core API needs.

I can easily be convinced to use torchcodec.transforms.Transform instead of torchcodec.transforms.DecoderNativeTransform. My motivation there is to make it more likely users will not confuse these with TorchVision's transforms. We also might be able to think of a better user-facing name for this feature, and these transforms, than "decoder native." Some concrete questions:

  1. Should we use Transform, DecoderNativeTransform or some other name for the base class name?
  2. Should the module path be torchcodec.transforms or torchcodec.decoder_native_transforms or something else? Putting it in the module path may make it easier to just use Transform. And, for that matter, Resize.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 27, 2025
dimension_order: Literal["NCHW", "NHWC"] = "NCHW",
num_ffmpeg_threads: int = 1,
device: Optional[Union[str, torch_device]] = "cpu",
transforms: List[Any] = [], # TRANSFORMS TODO: what is the user-facing type?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion point 1: If we accept TorchVision transforms, and we want to lazily load TorchVision, what type do we advertise here? We can easily explain that we accept a TorchVision transform in the docs, but what should we put in the type annotation?

Copy link
Contributor

@NicolasHug NicolasHug Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be either Any or nn.Module, which is the base class of all torchvision v2 transforms, and something users are familiar with since this is the core building block of any pytorch model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that solves the problem nicely: it can definitely be nn.Module.

show_error_codes = True
pretty = True
allow_redefinition = True
follow_untyped_imports = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants